This notebook is a template with each step that you need to complete for the project.
Please fill in your code where there are explicit ? markers in the notebook. You are welcome to add more cells and code as you see fit.
Once you have completed all the code implementations, please export your notebook as a HTML file so the reviews can view your code. Make sure you have all outputs correctly outputted.
File-> Export Notebook As... -> Export Notebook as HTML
There is a writeup to complete as well after all code implememtation is done. Please answer all questions and attach the necessary tables and charts. You can complete the writeup in either markdown or PDF.
Completing the code template and writeup template will cover all of the rubric points for this project.
The rubric contains "Stand Out Suggestions" for enhancing the project beyond the minimum requirements. The stand out suggestions are optional. If you decide to pursue the "stand out suggestions", you can include the code in this notebook and also discuss the results in the writeup file.
Below is example of steps to get the API username and key. Each student will have their own username and key.
kaggle.json and use the username and key.
ml.t3.medium instance (2 vCPU + 4 GiB)Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)!pip install -U pip
!pip install -U setuptools wheel
!pip install -U "mxnet<2.0.0" bokeh==2.0.1
!pip install autogluon --no-cache-dir
# Without --no-cache-dir, smaller aws instances may have trouble installing
# create the .kaggle directory and an empty kaggle.json file
!mkdir -p /root/.kaggle
!touch /root/.kaggle/kaggle.json
!chmod 600 /root/.kaggle/kaggle.json
mkdir: /root: Read-only file system touch: /root/.kaggle/kaggle.json: No such file or directory chmod: /root/.kaggle/kaggle.json: No such file or directory
As you can see above - my root is read only as I a working on a work laptop
# Fill in your user name and key from creating the kaggle account and API token file
import json
kaggle_username = "FILL_IN_USERNAME"
kaggle_key = "FILL_IN_KEY"
# Save API token the kaggle.json file
with open("/root/.kaggle/kaggle.json", "w") as f:
f.write(json.dumps({"username": kaggle_username, "key": kaggle_key}))
# Download the dataset, it will be in a .zip file so you'll need to unzip it as well.
!kaggle competitions download -c bike-sharing-demand
# If you already downloaded it you can use the -o command to overwrite the file
!unzip -o bike-sharing-demand.zip
import pandas as pd
from autogluon.tabular import TabularPredictor
import autogluon
# Create the train dataset in pandas by reading the csv
# Set the parsing of the datetime column so you can use some of the `dt` features in pandas later
train = pd.read_csv('bike-sharing-demand/train.csv').drop(columns=['casual','registered'])
train.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 16 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 40 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 32 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 13 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 1 |
# Simple output of the train dataset to view some of the min/max/varition of the dataset features.
train.describe()
| season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | |
|---|---|---|---|---|---|---|---|---|---|
| count | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.00000 | 10886.000000 | 10886.000000 | 10886.000000 | 10886.000000 |
| mean | 2.506614 | 0.028569 | 0.680875 | 1.418427 | 20.23086 | 23.655084 | 61.886460 | 12.799395 | 191.574132 |
| std | 1.116174 | 0.166599 | 0.466159 | 0.633839 | 7.79159 | 8.474601 | 19.245033 | 8.164537 | 181.144454 |
| min | 1.000000 | 0.000000 | 0.000000 | 1.000000 | 0.82000 | 0.760000 | 0.000000 | 0.000000 | 1.000000 |
| 25% | 2.000000 | 0.000000 | 0.000000 | 1.000000 | 13.94000 | 16.665000 | 47.000000 | 7.001500 | 42.000000 |
| 50% | 3.000000 | 0.000000 | 1.000000 | 1.000000 | 20.50000 | 24.240000 | 62.000000 | 12.998000 | 145.000000 |
| 75% | 4.000000 | 0.000000 | 1.000000 | 2.000000 | 26.24000 | 31.060000 | 77.000000 | 16.997900 | 284.000000 |
| max | 4.000000 | 1.000000 | 1.000000 | 4.000000 | 41.00000 | 45.455000 | 100.000000 | 56.996900 | 977.000000 |
# Create the test pandas dataframe in pandas by reading the csv, remember to parse the datetime!
test = pd.read_csv('bike-sharing-demand/test.csv')
test.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-20 00:00:00 | 1 | 0 | 1 | 1 | 10.66 | 11.365 | 56 | 26.0027 |
| 1 | 2011-01-20 01:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 2 | 2011-01-20 02:00:00 | 1 | 0 | 1 | 1 | 10.66 | 13.635 | 56 | 0.0000 |
| 3 | 2011-01-20 03:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
| 4 | 2011-01-20 04:00:00 | 1 | 0 | 1 | 1 | 10.66 | 12.880 | 56 | 11.0014 |
# Same thing as train and test dataset
submission = pd.read_csv('bike-sharing-demand/sampleSubmission.csv')
submission.head()
| datetime | count | |
|---|---|---|
| 0 | 2011-01-20 00:00:00 | 0 |
| 1 | 2011-01-20 01:00:00 | 0 |
| 2 | 2011-01-20 02:00:00 | 0 |
| 3 | 2011-01-20 03:00:00 | 0 |
| 4 | 2011-01-20 04:00:00 | 0 |
Requirements:
count, so it is the label we are setting.casual and registered columns as they are also not present in the test dataset. root_mean_squared_error as the metric to use for evaluation.best_quality to focus on creating the best model.%%time
predictor = TabularPredictor(label='count').fit(train_data=train,
time_limit=600,
presets="best_quality")
No path specified. Models will be saved in: "AutogluonModels/ag-20220401_142635/"
Presets specified: ['best_quality']
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20220401_142635/"
AutoGluon Version: 0.4.0
Python Version: 3.7.11
Operating System: Darwin
Train Data Rows: 10886
Train Data Columns: 9
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 6231.84 MB
Train Data (Original) Memory Usage: 1.52 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 2 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting DatetimeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 5 | ['season', 'holiday', 'workingday', 'weather', 'humidity']
('object', ['datetime_as_object']) : 1 | ['datetime']
Types of features in processed data (raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
0.1s = Fit runtime
9 features in original data used to generate 13 features in processed data.
Train Data (Processed) Memory Usage: 0.98 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.09s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.84s of the 599.9s of remaining time.
-101.5882 = Validation score (root_mean_squared_error)
0.02s = Training runtime
0.11s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.59s of the 599.66s of remaining time.
-84.1464 = Validation score (root_mean_squared_error)
0.02s = Training runtime
0.11s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 399.34s of the 599.41s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 131.684 [2000] valid_set's rmse: 130.67 [3000] valid_set's rmse: 130.626 [1000] valid_set's rmse: 135.592 [1000] valid_set's rmse: 133.481 [2000] valid_set's rmse: 132.323 [3000] valid_set's rmse: 131.618 [4000] valid_set's rmse: 131.443 [5000] valid_set's rmse: 131.265 [6000] valid_set's rmse: 131.277 [7000] valid_set's rmse: 131.443 [1000] valid_set's rmse: 128.503 [2000] valid_set's rmse: 127.654 [3000] valid_set's rmse: 127.227 [4000] valid_set's rmse: 127.105 [1000] valid_set's rmse: 134.135 [2000] valid_set's rmse: 132.272 [3000] valid_set's rmse: 131.286 [4000] valid_set's rmse: 130.752 [5000] valid_set's rmse: 130.363 [6000] valid_set's rmse: 130.509 [1000] valid_set's rmse: 136.168 [2000] valid_set's rmse: 135.138 [3000] valid_set's rmse: 135.029 [1000] valid_set's rmse: 134.061 [2000] valid_set's rmse: 133.034 [3000] valid_set's rmse: 132.182 [4000] valid_set's rmse: 131.997 [5000] valid_set's rmse: 131.643 [6000] valid_set's rmse: 131.504 [7000] valid_set's rmse: 131.574 [1000] valid_set's rmse: 132.912 [2000] valid_set's rmse: 131.703 [3000] valid_set's rmse: 131.117 [4000] valid_set's rmse: 130.82 [5000] valid_set's rmse: 130.673 [6000] valid_set's rmse: 130.708
-131.4609 = Validation score (root_mean_squared_error) 23.51s = Training runtime 0.66s = Validation runtime Fitting model: LightGBM_BAG_L1 ... Training model for up to 373.57s of the 573.64s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 130.818 [1000] valid_set's rmse: 133.204 [1000] valid_set's rmse: 130.928 [1000] valid_set's rmse: 126.846 [1000] valid_set's rmse: 131.426 [1000] valid_set's rmse: 133.655 [1000] valid_set's rmse: 132.155 [1000] valid_set's rmse: 130.62
-131.0542 = Validation score (root_mean_squared_error) 7.06s = Training runtime 0.14s = Validation runtime Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 365.91s of the 565.97s of remaining time. -116.6217 = Validation score (root_mean_squared_error) 2.13s = Training runtime 0.4s = Validation runtime Fitting model: CatBoost_BAG_L1 ... Training model for up to 362.93s of the 563.0s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 8469. -130.4612 = Validation score (root_mean_squared_error) 203.14s = Training runtime 0.03s = Validation runtime Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 159.68s of the 359.75s of remaining time. -124.6372 = Validation score (root_mean_squared_error) 1.12s = Training runtime 0.38s = Validation runtime Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 157.76s of the 357.83s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy -135.8425 = Validation score (root_mean_squared_error) 70.72s = Training runtime 0.17s = Validation runtime Fitting model: XGBoost_BAG_L1 ... Training model for up to 86.73s of the 286.8s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy -131.6247 = Validation score (root_mean_squared_error) 7.77s = Training runtime 0.05s = Validation runtime Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 78.78s of the 278.85s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, stopping training early. (Stopping on epoch 24) Ran out of time, stopping training early. (Stopping on epoch 25) Ran out of time, stopping training early. (Stopping on epoch 26) Ran out of time, stopping training early. (Stopping on epoch 26) Ran out of time, stopping training early. (Stopping on epoch 28) Ran out of time, stopping training early. (Stopping on epoch 30) Ran out of time, stopping training early. (Stopping on epoch 32) -141.8696 = Validation score (root_mean_squared_error) 75.34s = Training runtime 0.16s = Validation runtime Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 3.24s of the 203.31s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 122. Best iteration is: [122] valid_set's rmse: 130.763 Ran out of time, early stopping on iteration 118. Best iteration is: [118] valid_set's rmse: 135.088 Ran out of time, early stopping on iteration 121. Best iteration is: [120] valid_set's rmse: 133.515 Ran out of time, early stopping on iteration 127. Best iteration is: [127] valid_set's rmse: 128.059 Ran out of time, early stopping on iteration 129. Best iteration is: [129] valid_set's rmse: 131.506 Ran out of time, early stopping on iteration 112. Best iteration is: [112] valid_set's rmse: 134.815 Ran out of time, early stopping on iteration 127. Best iteration is: [127] valid_set's rmse: 133.262 Ran out of time, early stopping on iteration 153. Best iteration is: [153] valid_set's rmse: 131.555 -132.3385 = Validation score (root_mean_squared_error) 2.99s = Training runtime 0.05s = Validation runtime Completed 1/20 k-fold bagging repeats ... Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 200.05s of remaining time. -84.1464 = Validation score (root_mean_squared_error) 0.47s = Training runtime 0.0s = Validation runtime Fitting 9 L2 models ... Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 199.57s of the 199.54s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 60.3134 [2000] valid_set's rmse: 59.4888 [1000] valid_set's rmse: 60.6085 [2000] valid_set's rmse: 59.793 [1000] valid_set's rmse: 63.0976 [2000] valid_set's rmse: 62.0176 [1000] valid_set's rmse: 64.9102 [2000] valid_set's rmse: 63.0881 [3000] valid_set's rmse: 63.2331 [1000] valid_set's rmse: 57.7704 [2000] valid_set's rmse: 56.7398 [1000] valid_set's rmse: 62.6632 [2000] valid_set's rmse: 62.0369 [1000] valid_set's rmse: 62.2203 [2000] valid_set's rmse: 61.2668 [1000] valid_set's rmse: 58.5257
-60.2691 = Validation score (root_mean_squared_error)
15.02s = Training runtime
0.26s = Validation runtime
Fitting model: LightGBM_BAG_L2 ... Training model for up to 183.48s of the 183.46s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-54.9037 = Validation score (root_mean_squared_error)
4.51s = Training runtime
0.05s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 178.8s of the 178.78s of remaining time.
-53.3378 = Validation score (root_mean_squared_error)
5.88s = Training runtime
0.43s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 172.07s of the 172.05s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-55.3445 = Validation score (root_mean_squared_error)
40.8s = Training runtime
0.03s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 131.19s of the 131.17s of remaining time.
-53.9369 = Validation score (root_mean_squared_error)
1.85s = Training runtime
0.48s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 128.44s of the 128.42s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-50.9832 = Validation score (root_mean_squared_error)
82.26s = Training runtime
0.23s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 45.79s of the 45.77s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-55.0732 = Validation score (root_mean_squared_error)
8.37s = Training runtime
0.04s = Validation runtime
Fitting model: NeuralNetTorch_BAG_L2 ... Training model for up to 37.27s of the 37.25s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, stopping training early. (Stopping on epoch 9)
Ran out of time, stopping training early. (Stopping on epoch 10)
Ran out of time, stopping training early. (Stopping on epoch 10)
Ran out of time, stopping training early. (Stopping on epoch 9)
Ran out of time, stopping training early. (Stopping on epoch 11)
Ran out of time, stopping training early. (Stopping on epoch 11)
Ran out of time, stopping training early. (Stopping on epoch 12)
Ran out of time, stopping training early. (Stopping on epoch 16)
-71.5629 = Validation score (root_mean_squared_error)
35.29s = Training runtime
0.24s = Validation runtime
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 1.69s of the 1.66s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, early stopping on iteration 8. Best iteration is:
[8] valid_set's rmse: 145.221
Ran out of time, early stopping on iteration 11. Best iteration is:
[11] valid_set's rmse: 137.729
Ran out of time, early stopping on iteration 11. Best iteration is:
[11] valid_set's rmse: 137.48
Ran out of time, early stopping on iteration 14. Best iteration is:
[14] valid_set's rmse: 133.114
Ran out of time, early stopping on iteration 16. Best iteration is:
[16] valid_set's rmse: 118.625
Ran out of time, early stopping on iteration 16. Best iteration is:
[16] valid_set's rmse: 122.549
Ran out of time, early stopping on iteration 18. Best iteration is:
[18] valid_set's rmse: 113.061
Ran out of time, early stopping on iteration 26. Best iteration is:
[26] valid_set's rmse: 93.4056
-126.1366 = Validation score (root_mean_squared_error)
1.61s = Training runtime
0.02s = Validation runtime
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the -0.03s of remaining time.
-50.0823 = Validation score (root_mean_squared_error)
0.39s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 600.44s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20220401_142635/")
CPU times: user 23min 44s, sys: 1min 5s, total: 24min 50s Wall time: 10min
predictor.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -50.082320 3.382371 484.173741 0.000456 0.386664 3 True 22
1 NeuralNetFastAI_BAG_L2 -50.983240 2.472058 476.060783 0.227915 82.256011 2 True 18
2 RandomForestMSE_BAG_L2 -53.337840 2.675888 399.685822 0.431745 5.881051 2 True 15
3 ExtraTreesMSE_BAG_L2 -53.936948 2.722255 395.650014 0.478112 1.845243 2 True 17
4 LightGBM_BAG_L2 -54.903747 2.298769 398.316719 0.054626 4.511948 2 True 14
5 XGBoost_BAG_L2 -55.073222 2.286634 402.179717 0.042491 8.374946 2 True 19
6 CatBoost_BAG_L2 -55.344477 2.272568 434.602493 0.028425 40.797722 2 True 16
7 LightGBMXT_BAG_L2 -60.269119 2.506013 408.828436 0.261870 15.023664 2 True 13
8 NeuralNetTorch_BAG_L2 -71.562949 2.481272 429.090993 0.237129 35.286222 2 True 20
9 KNeighborsDist_BAG_L1 -84.146423 0.108243 0.020491 0.108243 0.020491 1 True 2
10 WeightedEnsemble_L2 -84.146423 0.108666 0.489467 0.000423 0.468976 2 True 12
11 KNeighborsUnif_BAG_L1 -101.588176 0.106599 0.016221 0.106599 0.016221 1 True 1
12 RandomForestMSE_BAG_L1 -116.621736 0.398140 2.127761 0.398140 2.127761 1 True 5
13 ExtraTreesMSE_BAG_L1 -124.637158 0.376534 1.123487 0.376534 1.123487 1 True 7
14 LightGBMLarge_BAG_L2 -126.136620 2.267577 395.416492 0.023434 1.611720 2 True 21
15 CatBoost_BAG_L1 -130.461205 0.028665 203.138522 0.028665 203.138522 1 True 6
16 LightGBM_BAG_L1 -131.054162 0.138670 7.056198 0.138670 7.056198 1 True 4
17 LightGBMXT_BAG_L1 -131.460909 0.655655 23.510633 0.655655 23.510633 1 True 3
18 XGBoost_BAG_L1 -131.624665 0.052393 7.769693 0.052393 7.769693 1 True 9
19 LightGBMLarge_BAG_L1 -132.338466 0.050611 2.985321 0.050611 2.985321 1 True 11
20 NeuralNetFastAI_BAG_L1 -135.842475 0.173030 70.715646 0.173030 70.715646 1 True 8
21 NeuralNetTorch_BAG_L1 -141.869626 0.155603 75.340798 0.155603 75.340798 1 True 10
Number of models trained: 22
Types of models trained:
{'StackerEnsembleModel_KNN', 'StackerEnsembleModel_XT', 'WeightedEnsembleModel', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_TabularNeuralNetTorch', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_XGBoost'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 3 | ['season', 'weather', 'humidity']
('int', ['bool']) : 2 | ['holiday', 'workingday']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20220401_142635/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
'XGBoost_BAG_L1': 'StackerEnsembleModel_XGBoost',
'NeuralNetTorch_BAG_L1': 'StackerEnsembleModel_TabularNeuralNetTorch',
'LightGBMLarge_BAG_L1': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular',
'XGBoost_BAG_L2': 'StackerEnsembleModel_XGBoost',
'NeuralNetTorch_BAG_L2': 'StackerEnsembleModel_TabularNeuralNetTorch',
'LightGBMLarge_BAG_L2': 'StackerEnsembleModel_LGB',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -101.58817625927213,
'KNeighborsDist_BAG_L1': -84.14642264302962,
'LightGBMXT_BAG_L1': -131.46090891834504,
'LightGBM_BAG_L1': -131.054161598899,
'RandomForestMSE_BAG_L1': -116.62173601727898,
'CatBoost_BAG_L1': -130.46120460893414,
'ExtraTreesMSE_BAG_L1': -124.63715787314163,
'NeuralNetFastAI_BAG_L1': -135.84247471324838,
'XGBoost_BAG_L1': -131.62466543942023,
'NeuralNetTorch_BAG_L1': -141.86962641231048,
'LightGBMLarge_BAG_L1': -132.3384662656421,
'WeightedEnsemble_L2': -84.14642264302962,
'LightGBMXT_BAG_L2': -60.26911859830421,
'LightGBM_BAG_L2': -54.90374721128587,
'RandomForestMSE_BAG_L2': -53.33783954632345,
'CatBoost_BAG_L2': -55.34447699987185,
'ExtraTreesMSE_BAG_L2': -53.93694753481746,
'NeuralNetFastAI_BAG_L2': -50.983239845091106,
'XGBoost_BAG_L2': -55.07322158022818,
'NeuralNetTorch_BAG_L2': -71.56294902297918,
'LightGBMLarge_BAG_L2': -126.13661969968844,
'WeightedEnsemble_L3': -50.08232035450195},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/LightGBMXT_BAG_L1/',
'LightGBM_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/LightGBM_BAG_L1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/CatBoost_BAG_L1/',
'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/ExtraTreesMSE_BAG_L1/',
'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/NeuralNetFastAI_BAG_L1/',
'XGBoost_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/XGBoost_BAG_L1/',
'NeuralNetTorch_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/NeuralNetTorch_BAG_L1/',
'LightGBMLarge_BAG_L1': 'AutogluonModels/ag-20220401_142635/models/LightGBMLarge_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20220401_142635/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/LightGBMXT_BAG_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/LightGBM_BAG_L2/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/CatBoost_BAG_L2/',
'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/ExtraTreesMSE_BAG_L2/',
'NeuralNetFastAI_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/NeuralNetFastAI_BAG_L2/',
'XGBoost_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/XGBoost_BAG_L2/',
'NeuralNetTorch_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/NeuralNetTorch_BAG_L2/',
'LightGBMLarge_BAG_L2': 'AutogluonModels/ag-20220401_142635/models/LightGBMLarge_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20220401_142635/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.016221046447753906,
'KNeighborsDist_BAG_L1': 0.02049112319946289,
'LightGBMXT_BAG_L1': 23.510632753372192,
'LightGBM_BAG_L1': 7.056197881698608,
'RandomForestMSE_BAG_L1': 2.127761125564575,
'CatBoost_BAG_L1': 203.13852214813232,
'ExtraTreesMSE_BAG_L1': 1.1234869956970215,
'NeuralNetFastAI_BAG_L1': 70.7156457901001,
'XGBoost_BAG_L1': 7.769693374633789,
'NeuralNetTorch_BAG_L1': 75.34079813957214,
'LightGBMLarge_BAG_L1': 2.985320806503296,
'WeightedEnsemble_L2': 0.4689757823944092,
'LightGBMXT_BAG_L2': 15.023664474487305,
'LightGBM_BAG_L2': 4.511948108673096,
'RandomForestMSE_BAG_L2': 5.881051063537598,
'CatBoost_BAG_L2': 40.79772210121155,
'ExtraTreesMSE_BAG_L2': 1.845242977142334,
'NeuralNetFastAI_BAG_L2': 82.25601148605347,
'XGBoost_BAG_L2': 8.374945640563965,
'NeuralNetTorch_BAG_L2': 35.286221981048584,
'LightGBMLarge_BAG_L2': 1.611720323562622,
'WeightedEnsemble_L3': 0.38666391372680664},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.10659933090209961,
'KNeighborsDist_BAG_L1': 0.10824298858642578,
'LightGBMXT_BAG_L1': 0.6556551456451416,
'LightGBM_BAG_L1': 0.1386704444885254,
'RandomForestMSE_BAG_L1': 0.39813971519470215,
'CatBoost_BAG_L1': 0.028664588928222656,
'ExtraTreesMSE_BAG_L1': 0.37653398513793945,
'NeuralNetFastAI_BAG_L1': 0.17302966117858887,
'XGBoost_BAG_L1': 0.05239295959472656,
'NeuralNetTorch_BAG_L1': 0.15560269355773926,
'LightGBMLarge_BAG_L1': 0.05061149597167969,
'WeightedEnsemble_L2': 0.0004229545593261719,
'LightGBMXT_BAG_L2': 0.2618696689605713,
'LightGBM_BAG_L2': 0.05462646484375,
'RandomForestMSE_BAG_L2': 0.4317450523376465,
'CatBoost_BAG_L2': 0.028424501419067383,
'ExtraTreesMSE_BAG_L2': 0.47811198234558105,
'NeuralNetFastAI_BAG_L2': 0.22791481018066406,
'XGBoost_BAG_L2': 0.04249095916748047,
'NeuralNetTorch_BAG_L2': 0.23712873458862305,
'LightGBMLarge_BAG_L2': 0.023433685302734375,
'WeightedEnsemble_L3': 0.0004558563232421875},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'XGBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMLarge_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'XGBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMLarge_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -50.082320 3.382371 484.173741
1 NeuralNetFastAI_BAG_L2 -50.983240 2.472058 476.060783
2 RandomForestMSE_BAG_L2 -53.337840 2.675888 399.685822
3 ExtraTreesMSE_BAG_L2 -53.936948 2.722255 395.650014
4 LightGBM_BAG_L2 -54.903747 2.298769 398.316719
5 XGBoost_BAG_L2 -55.073222 2.286634 402.179717
6 CatBoost_BAG_L2 -55.344477 2.272568 434.602493
7 LightGBMXT_BAG_L2 -60.269119 2.506013 408.828436
8 NeuralNetTorch_BAG_L2 -71.562949 2.481272 429.090993
9 KNeighborsDist_BAG_L1 -84.146423 0.108243 0.020491
10 WeightedEnsemble_L2 -84.146423 0.108666 0.489467
11 KNeighborsUnif_BAG_L1 -101.588176 0.106599 0.016221
12 RandomForestMSE_BAG_L1 -116.621736 0.398140 2.127761
13 ExtraTreesMSE_BAG_L1 -124.637158 0.376534 1.123487
14 LightGBMLarge_BAG_L2 -126.136620 2.267577 395.416492
15 CatBoost_BAG_L1 -130.461205 0.028665 203.138522
16 LightGBM_BAG_L1 -131.054162 0.138670 7.056198
17 LightGBMXT_BAG_L1 -131.460909 0.655655 23.510633
18 XGBoost_BAG_L1 -131.624665 0.052393 7.769693
19 LightGBMLarge_BAG_L1 -132.338466 0.050611 2.985321
20 NeuralNetFastAI_BAG_L1 -135.842475 0.173030 70.715646
21 NeuralNetTorch_BAG_L1 -141.869626 0.155603 75.340798
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000456 0.386664 3 True
1 0.227915 82.256011 2 True
2 0.431745 5.881051 2 True
3 0.478112 1.845243 2 True
4 0.054626 4.511948 2 True
5 0.042491 8.374946 2 True
6 0.028425 40.797722 2 True
7 0.261870 15.023664 2 True
8 0.237129 35.286222 2 True
9 0.108243 0.020491 1 True
10 0.000423 0.468976 2 True
11 0.106599 0.016221 1 True
12 0.398140 2.127761 1 True
13 0.376534 1.123487 1 True
14 0.023434 1.611720 2 True
15 0.028665 203.138522 1 True
16 0.138670 7.056198 1 True
17 0.655655 23.510633 1 True
18 0.052393 7.769693 1 True
19 0.050611 2.985321 1 True
20 0.173030 70.715646 1 True
21 0.155603 75.340798 1 True
fit_order
0 22
1 18
2 15
3 17
4 14
5 19
6 16
7 13
8 20
9 2
10 12
11 1
12 5
13 7
14 21
15 6
16 4
17 3
18 9
19 11
20 8
21 10 }
predictions = predictor.predict(test)
predictions.head()
0 26.739716 1 40.376095 2 45.714859 3 49.119461 4 51.235603 Name: count, dtype: float32
# Describe the `predictions` series to see if there are any negative values
predictions.describe()
count 6493.000000 mean 99.762779 std 87.836624 min -1.735354 25% 22.437004 50% 68.314713 75% 165.213531 max 354.473633 Name: count, dtype: float64
# How many negative values do we have?
predictions.where(predictions.values <0).dropna()
211 -0.525363 212 -0.154420 213 -1.735354 214 -1.734224 215 -1.040829 216 -1.038325 217 -0.334664 Name: count, dtype: float32
# Set them to zero
predictions = predictions.apply(lambda x: np.array(x).clip(min=0))
submission["count"] = predictions
submission.to_csv("submission_no_features.csv", index=False)
# I work on a work laptop and have no permissions to save json to root directory so all submissions were
# done on the website
!kaggle competitions submit -c bike-sharing-demand -f submission.csv -m "first raw submission"
Traceback (most recent call last):
File "/Users/saho/opt/anaconda3/envs/bike_share/bin/kaggle", line 5, in <module>
from kaggle.cli import main
File "/Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages/kaggle/__init__.py", line 23, in <module>
api.authenticate()
File "/Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 166, in authenticate
self.config_file, self.config_dir))
OSError: Could not find kaggle.json. Make sure it's located in /Users/saho/.kaggle. Or use the environment method.
My Submissions¶# I work on a work laptop and have no permissions to save json to root directory so all submissions were
# done on the website
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
Traceback (most recent call last):
File "/Users/saho/opt/anaconda3/envs/bike_share/bin/kaggle", line 5, in <module>
from kaggle.cli import main
File "/Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages/kaggle/__init__.py", line 23, in <module>
api.authenticate()
File "/Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages/kaggle/api/kaggle_api_extended.py", line 166, in authenticate
self.config_file, self.config_dir))
OSError: Could not find kaggle.json. Make sure it's located in /Users/saho/.kaggle. Or use the environment method.
1.85306¶# Create a histogram of all features to show the distribution of each one relative to the data. This is part of the exploritory data analysis
train.hist()
array([[<AxesSubplot:title={'center':'season'}>,
<AxesSubplot:title={'center':'holiday'}>,
<AxesSubplot:title={'center':'workingday'}>],
[<AxesSubplot:title={'center':'weather'}>,
<AxesSubplot:title={'center':'temp'}>,
<AxesSubplot:title={'center':'atemp'}>],
[<AxesSubplot:title={'center':'humidity'}>,
<AxesSubplot:title={'center':'windspeed'}>,
<AxesSubplot:title={'center':'count'}>]], dtype=object)
# convert date column to date time format so we can parse date
train['datetime'] = pd.to_datetime(train['datetime'])
test['datetime'] = pd.to_datetime(test['datetime'])
# create a new feature
train['year'] = train['datetime'].apply(lambda x: x.year)
test['year'] = test['datetime'].apply(lambda x: x.year)
train['month'] = train['datetime'].apply(lambda x: x.month)
test['month'] = test['datetime'].apply(lambda x: x.month)
train['day'] = train['datetime'].apply(lambda x: x.day)
test['day'] = test['datetime'].apply(lambda x: x.day)
train['hour'] = train['datetime'].apply(lambda x: x.hour)
test['hour'] = test['datetime'].apply(lambda x: x.hour)
train["season"] = train["season"].astype('category')
train["weather"] = train["weather"].astype('category')
test["season"] = test["season"].astype('category')
test["weather"] = test["weather"].astype('category')
# View our new feature
train.head()
| datetime | season | holiday | workingday | weather | temp | atemp | humidity | windspeed | count | year | month | day | hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2011-01-01 00:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 81 | 0.0 | 16 | 2011 | 1 | 1 | 0 |
| 1 | 2011-01-01 01:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 40 | 2011 | 1 | 1 | 1 |
| 2 | 2011-01-01 02:00:00 | 1 | 0 | 0 | 1 | 9.02 | 13.635 | 80 | 0.0 | 32 | 2011 | 1 | 1 | 2 |
| 3 | 2011-01-01 03:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 13 | 2011 | 1 | 1 | 3 |
| 4 | 2011-01-01 04:00:00 | 1 | 0 | 0 | 1 | 9.84 | 14.395 | 75 | 0.0 | 1 | 2011 | 1 | 1 | 4 |
# View histogram of all features again now with the hour feature
train.hist()
array([[<AxesSubplot:title={'center':'datetime'}>,
<AxesSubplot:title={'center':'holiday'}>,
<AxesSubplot:title={'center':'workingday'}>],
[<AxesSubplot:title={'center':'temp'}>,
<AxesSubplot:title={'center':'atemp'}>,
<AxesSubplot:title={'center':'humidity'}>],
[<AxesSubplot:title={'center':'windspeed'}>,
<AxesSubplot:title={'center':'count'}>,
<AxesSubplot:title={'center':'year'}>],
[<AxesSubplot:title={'center':'month'}>,
<AxesSubplot:title={'center':'day'}>,
<AxesSubplot:title={'center':'hour'}>]], dtype=object)
%%time
predictor_new_features = TabularPredictor(label='count').fit(train_data=train,
time_limit=600,
presets="best_quality")
No path specified. Models will be saved in: "AutogluonModels/ag-20220401_143648/"
Presets specified: ['best_quality']
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20220401_143648/"
AutoGluon Version: 0.4.0
Python Version: 3.7.11
Operating System: Darwin
Train Data Rows: 10886
Train Data Columns: 13
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 5898.84 MB
Train Data (Original) Memory Usage: 0.98 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Fitting DatetimeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('datetime', []) : 1 | ['datetime']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 7 | ['holiday', 'workingday', 'humidity', 'year', 'month', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
0.1s = Fit runtime
13 features in original data used to generate 17 features in processed data.
Train Data (Processed) Memory Usage: 1.1 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.13s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
Fitting 11 L1 models ...
Fitting model: KNeighborsUnif_BAG_L1 ... Training model for up to 399.81s of the 599.87s of remaining time.
-101.5882 = Validation score (root_mean_squared_error)
0.02s = Training runtime
0.11s = Validation runtime
Fitting model: KNeighborsDist_BAG_L1 ... Training model for up to 399.56s of the 599.62s of remaining time.
-84.1464 = Validation score (root_mean_squared_error)
0.02s = Training runtime
0.11s = Validation runtime
Fitting model: LightGBMXT_BAG_L1 ... Training model for up to 399.31s of the 599.36s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 35.1395 [2000] valid_set's rmse: 33.4443 [3000] valid_set's rmse: 33.2224 [4000] valid_set's rmse: 33.1502 [5000] valid_set's rmse: 33.1935 [1000] valid_set's rmse: 36.5025 [2000] valid_set's rmse: 34.5423 [3000] valid_set's rmse: 34.1505 [4000] valid_set's rmse: 33.9807 [5000] valid_set's rmse: 33.9578 [1000] valid_set's rmse: 35.9928 [2000] valid_set's rmse: 34.2255 [3000] valid_set's rmse: 33.6889 [4000] valid_set's rmse: 33.4699 [5000] valid_set's rmse: 33.3243 [6000] valid_set's rmse: 33.268 [7000] valid_set's rmse: 33.2083 [8000] valid_set's rmse: 33.2182 [9000] valid_set's rmse: 33.2138 [1000] valid_set's rmse: 37.8288 [2000] valid_set's rmse: 36.2891 [3000] valid_set's rmse: 35.978 [4000] valid_set's rmse: 35.8689 [5000] valid_set's rmse: 35.8846 [1000] valid_set's rmse: 39.2166 [2000] valid_set's rmse: 37.1437 [3000] valid_set's rmse: 36.7029 [4000] valid_set's rmse: 36.649 [5000] valid_set's rmse: 36.6295 [1000] valid_set's rmse: 35.6761 [2000] valid_set's rmse: 33.4538 [3000] valid_set's rmse: 32.9091 [4000] valid_set's rmse: 32.6792 [5000] valid_set's rmse: 32.561 [6000] valid_set's rmse: 32.5437 [1000] valid_set's rmse: 38.6292 [2000] valid_set's rmse: 37.1458 [3000] valid_set's rmse: 36.7946 [4000] valid_set's rmse: 36.5764 [5000] valid_set's rmse: 36.4936 [6000] valid_set's rmse: 36.464 [7000] valid_set's rmse: 36.4865 [1000] valid_set's rmse: 35.6962 [2000] valid_set's rmse: 33.3866 [3000] valid_set's rmse: 32.9207 [4000] valid_set's rmse: 32.8302
-34.346 = Validation score (root_mean_squared_error) 31.92s = Training runtime 0.72s = Validation runtime Fitting model: LightGBM_BAG_L1 ... Training model for up to 364.54s of the 564.59s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 33.1713 [2000] valid_set's rmse: 33.0077 [1000] valid_set's rmse: 32.8635 [2000] valid_set's rmse: 32.6404 [1000] valid_set's rmse: 31.9543 [2000] valid_set's rmse: 31.343 [3000] valid_set's rmse: 30.9039 [4000] valid_set's rmse: 30.8612 [1000] valid_set's rmse: 35.8483 [2000] valid_set's rmse: 35.4773 [3000] valid_set's rmse: 35.3993 [1000] valid_set's rmse: 35.5388 [1000] valid_set's rmse: 31.6283 [1000] valid_set's rmse: 37.9327 [2000] valid_set's rmse: 37.4577 [1000] valid_set's rmse: 34.9434 [2000] valid_set's rmse: 34.6719
-33.9173 = Validation score (root_mean_squared_error) 13.71s = Training runtime 0.26s = Validation runtime Fitting model: RandomForestMSE_BAG_L1 ... Training model for up to 349.74s of the 549.79s of remaining time. -38.3578 = Validation score (root_mean_squared_error) 2.63s = Training runtime 0.45s = Validation runtime Fitting model: CatBoost_BAG_L1 ... Training model for up to 346.25s of the 546.3s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 5738. Ran out of time, early stopping on iteration 5927. Ran out of time, early stopping on iteration 6170. Ran out of time, early stopping on iteration 6200. Ran out of time, early stopping on iteration 6333. Ran out of time, early stopping on iteration 6612. Ran out of time, early stopping on iteration 7096. Ran out of time, early stopping on iteration 7762. -33.3524 = Validation score (root_mean_squared_error) 332.14s = Training runtime 0.07s = Validation runtime Fitting model: ExtraTreesMSE_BAG_L1 ... Training model for up to 13.87s of the 213.93s of remaining time. -38.2024 = Validation score (root_mean_squared_error) 1.51s = Training runtime 0.42s = Validation runtime Fitting model: NeuralNetFastAI_BAG_L1 ... Training model for up to 11.28s of the 211.33s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, stopping training early. (Stopping on epoch 0) Ran out of time, stopping training early. (Stopping on epoch 0) Ran out of time, stopping training early. (Stopping on epoch 0) Ran out of time, stopping training early. (Stopping on epoch 0) Ran out of time, stopping training early. (Stopping on epoch 1) Ran out of time, stopping training early. (Stopping on epoch 1) Ran out of time, stopping training early. (Stopping on epoch 2) Ran out of time, stopping training early. (Stopping on epoch 3) -119.599 = Validation score (root_mean_squared_error) 10.2s = Training runtime 0.27s = Validation runtime Fitting model: XGBoost_BAG_L1 ... Training model for up to 0.65s of the 200.7s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Time limit exceeded... Skipping XGBoost_BAG_L1. Fitting model: NeuralNetTorch_BAG_L1 ... Training model for up to 0.54s of the 200.59s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Time limit exceeded... Skipping NeuralNetTorch_BAG_L1. Fitting model: LightGBMLarge_BAG_L1 ... Training model for up to 0.29s of the 200.34s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy Ran out of time, early stopping on iteration 1. Best iteration is: [1] valid_set's rmse: 176.713 Time limit exceeded... Skipping LightGBMLarge_BAG_L1. Completed 1/20 k-fold bagging repeats ... Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 200.09s of remaining time. -32.0162 = Validation score (root_mean_squared_error) 0.38s = Training runtime 0.0s = Validation runtime Fitting 9 L2 models ... Fitting model: LightGBMXT_BAG_L2 ... Training model for up to 199.69s of the 199.68s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 34.8621 [1000] valid_set's rmse: 29.7564 [1000] valid_set's rmse: 30.6673
-31.2715 = Validation score (root_mean_squared_error) 7.26s = Training runtime 0.12s = Validation runtime Fitting model: LightGBM_BAG_L2 ... Training model for up to 192.02s of the 192.01s of remaining time. Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
[1000] valid_set's rmse: 31.0707
-30.454 = Validation score (root_mean_squared_error)
5.49s = Training runtime
0.08s = Validation runtime
Fitting model: RandomForestMSE_BAG_L2 ... Training model for up to 186.3s of the 186.29s of remaining time.
-31.5313 = Validation score (root_mean_squared_error)
6.34s = Training runtime
0.51s = Validation runtime
Fitting model: CatBoost_BAG_L2 ... Training model for up to 178.84s of the 178.82s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-30.4131 = Validation score (root_mean_squared_error)
62.33s = Training runtime
0.05s = Validation runtime
Fitting model: ExtraTreesMSE_BAG_L2 ... Training model for up to 116.4s of the 116.39s of remaining time.
-31.3642 = Validation score (root_mean_squared_error)
1.86s = Training runtime
0.46s = Validation runtime
Fitting model: NeuralNetFastAI_BAG_L2 ... Training model for up to 113.67s of the 113.66s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-29.5871 = Validation score (root_mean_squared_error)
79.39s = Training runtime
0.23s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 33.89s of the 33.88s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-30.5669 = Validation score (root_mean_squared_error)
7.99s = Training runtime
0.07s = Validation runtime
Fitting model: NeuralNetTorch_BAG_L2 ... Training model for up to 25.74s of the 25.73s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, stopping training early. (Stopping on epoch 5)
Ran out of time, stopping training early. (Stopping on epoch 6)
Ran out of time, stopping training early. (Stopping on epoch 6)
Ran out of time, stopping training early. (Stopping on epoch 7)
Ran out of time, stopping training early. (Stopping on epoch 6)
Ran out of time, stopping training early. (Stopping on epoch 8)
Ran out of time, stopping training early. (Stopping on epoch 8)
Ran out of time, stopping training early. (Stopping on epoch 11)
-32.7122 = Validation score (root_mean_squared_error)
24.13s = Training runtime
0.24s = Validation runtime
Fitting model: LightGBMLarge_BAG_L2 ... Training model for up to 1.31s of the 1.3s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
Ran out of time, early stopping on iteration 1. Best iteration is:
[1] valid_set's rmse: 174.026
Time limit exceeded... Skipping LightGBMLarge_BAG_L2.
Completed 1/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 1.07s of remaining time.
-29.3138 = Validation score (root_mean_squared_error)
0.36s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 599.3s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20220401_143648/")
CPU times: user 29min 51s, sys: 54.6 s, total: 30min 45s Wall time: 9min 59s
predictor_new_features.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -29.313835 2.834204 547.701941 0.000501 0.356613 3 True 18
1 NeuralNetFastAI_BAG_L2 -29.587140 2.640517 471.536646 0.226674 79.394585 2 True 15
2 CatBoost_BAG_L2 -30.413103 2.461370 454.472600 0.047527 62.330538 2 True 13
3 LightGBM_BAG_L2 -30.454012 2.489937 397.634877 0.076094 5.492815 2 True 11
4 XGBoost_BAG_L2 -30.566856 2.483407 400.127391 0.069564 7.985329 2 True 16
5 LightGBMXT_BAG_L2 -31.271497 2.532743 399.403208 0.118901 7.261146 2 True 10
6 ExtraTreesMSE_BAG_L2 -31.364232 2.872459 393.998612 0.458616 1.856550 2 True 14
7 RandomForestMSE_BAG_L2 -31.531317 2.918926 398.478997 0.505084 6.336935 2 True 12
8 WeightedEnsemble_L2 -32.016161 1.612180 380.789136 0.000858 0.377501 2 True 9
9 NeuralNetTorch_BAG_L2 -32.712166 2.650411 416.275231 0.236568 24.133169 2 True 17
10 CatBoost_BAG_L1 -33.352401 0.071735 332.139227 0.071735 332.139227 1 True 6
11 LightGBM_BAG_L1 -33.917339 0.256798 13.708674 0.256798 13.708674 1 True 4
12 LightGBMXT_BAG_L1 -34.345997 0.722591 31.919058 0.722591 31.919058 1 True 3
13 ExtraTreesMSE_BAG_L1 -38.202438 0.424710 1.506421 0.424710 1.506421 1 True 7
14 RandomForestMSE_BAG_L1 -38.357786 0.450909 2.625593 0.450909 2.625593 1 True 5
15 KNeighborsDist_BAG_L1 -84.146423 0.109289 0.019083 0.109289 0.019083 1 True 2
16 KNeighborsUnif_BAG_L1 -101.588176 0.108525 0.019136 0.108525 0.019136 1 True 1
17 NeuralNetFastAI_BAG_L1 -119.598987 0.269285 10.204870 0.269285 10.204870 1 True 8
Number of models trained: 18
Types of models trained:
{'StackerEnsembleModel_KNN', 'StackerEnsembleModel_XT', 'WeightedEnsembleModel', 'StackerEnsembleModel_NNFastAiTabular', 'StackerEnsembleModel_CatBoost', 'StackerEnsembleModel_TabularNeuralNetTorch', 'StackerEnsembleModel_LGB', 'StackerEnsembleModel_RF', 'StackerEnsembleModel_XGBoost'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20220401_143648/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'KNeighborsUnif_BAG_L1': 'StackerEnsembleModel_KNN',
'KNeighborsDist_BAG_L1': 'StackerEnsembleModel_KNN',
'LightGBMXT_BAG_L1': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L1': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L1': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L1': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L1': 'StackerEnsembleModel_NNFastAiTabular',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBMXT_BAG_L2': 'StackerEnsembleModel_LGB',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'RandomForestMSE_BAG_L2': 'StackerEnsembleModel_RF',
'CatBoost_BAG_L2': 'StackerEnsembleModel_CatBoost',
'ExtraTreesMSE_BAG_L2': 'StackerEnsembleModel_XT',
'NeuralNetFastAI_BAG_L2': 'StackerEnsembleModel_NNFastAiTabular',
'XGBoost_BAG_L2': 'StackerEnsembleModel_XGBoost',
'NeuralNetTorch_BAG_L2': 'StackerEnsembleModel_TabularNeuralNetTorch',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'KNeighborsUnif_BAG_L1': -101.58817625927213,
'KNeighborsDist_BAG_L1': -84.14642264302962,
'LightGBMXT_BAG_L1': -34.34599701170154,
'LightGBM_BAG_L1': -33.91733862651761,
'RandomForestMSE_BAG_L1': -38.35778601783482,
'CatBoost_BAG_L1': -33.35240069343452,
'ExtraTreesMSE_BAG_L1': -38.20243803292602,
'NeuralNetFastAI_BAG_L1': -119.59898747214625,
'WeightedEnsemble_L2': -32.01616058636411,
'LightGBMXT_BAG_L2': -31.27149744828551,
'LightGBM_BAG_L2': -30.454012181725293,
'RandomForestMSE_BAG_L2': -31.531317387505304,
'CatBoost_BAG_L2': -30.413103312963404,
'ExtraTreesMSE_BAG_L2': -31.364231776136368,
'NeuralNetFastAI_BAG_L2': -29.587139926922703,
'XGBoost_BAG_L2': -30.566856074940024,
'NeuralNetTorch_BAG_L2': -32.712166465414654,
'WeightedEnsemble_L3': -29.313834847169364},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'KNeighborsUnif_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/KNeighborsUnif_BAG_L1/',
'KNeighborsDist_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/KNeighborsDist_BAG_L1/',
'LightGBMXT_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/LightGBMXT_BAG_L1/',
'LightGBM_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/LightGBM_BAG_L1/',
'RandomForestMSE_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/RandomForestMSE_BAG_L1/',
'CatBoost_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/CatBoost_BAG_L1/',
'ExtraTreesMSE_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/ExtraTreesMSE_BAG_L1/',
'NeuralNetFastAI_BAG_L1': 'AutogluonModels/ag-20220401_143648/models/NeuralNetFastAI_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20220401_143648/models/WeightedEnsemble_L2/',
'LightGBMXT_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/LightGBMXT_BAG_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/LightGBM_BAG_L2/',
'RandomForestMSE_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/RandomForestMSE_BAG_L2/',
'CatBoost_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/CatBoost_BAG_L2/',
'ExtraTreesMSE_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/ExtraTreesMSE_BAG_L2/',
'NeuralNetFastAI_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/NeuralNetFastAI_BAG_L2/',
'XGBoost_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/XGBoost_BAG_L2/',
'NeuralNetTorch_BAG_L2': 'AutogluonModels/ag-20220401_143648/models/NeuralNetTorch_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20220401_143648/models/WeightedEnsemble_L3/'},
'model_fit_times': {'KNeighborsUnif_BAG_L1': 0.01913595199584961,
'KNeighborsDist_BAG_L1': 0.019083023071289062,
'LightGBMXT_BAG_L1': 31.919058322906494,
'LightGBM_BAG_L1': 13.70867371559143,
'RandomForestMSE_BAG_L1': 2.6255927085876465,
'CatBoost_BAG_L1': 332.1392273902893,
'ExtraTreesMSE_BAG_L1': 1.5064210891723633,
'NeuralNetFastAI_BAG_L1': 10.204869508743286,
'WeightedEnsemble_L2': 0.3775007724761963,
'LightGBMXT_BAG_L2': 7.261145830154419,
'LightGBM_BAG_L2': 5.492815017700195,
'RandomForestMSE_BAG_L2': 6.336934804916382,
'CatBoost_BAG_L2': 62.33053803443909,
'ExtraTreesMSE_BAG_L2': 1.8565499782562256,
'NeuralNetFastAI_BAG_L2': 79.39458465576172,
'XGBoost_BAG_L2': 7.985328912734985,
'NeuralNetTorch_BAG_L2': 24.133169412612915,
'WeightedEnsemble_L3': 0.3566131591796875},
'model_pred_times': {'KNeighborsUnif_BAG_L1': 0.10852503776550293,
'KNeighborsDist_BAG_L1': 0.10928893089294434,
'LightGBMXT_BAG_L1': 0.7225911617279053,
'LightGBM_BAG_L1': 0.25679779052734375,
'RandomForestMSE_BAG_L1': 0.450908899307251,
'CatBoost_BAG_L1': 0.07173538208007812,
'ExtraTreesMSE_BAG_L1': 0.4247100353240967,
'NeuralNetFastAI_BAG_L1': 0.2692854404449463,
'WeightedEnsemble_L2': 0.000858306884765625,
'LightGBMXT_BAG_L2': 0.11890077590942383,
'LightGBM_BAG_L2': 0.07609438896179199,
'RandomForestMSE_BAG_L2': 0.5050837993621826,
'CatBoost_BAG_L2': 0.047527313232421875,
'ExtraTreesMSE_BAG_L2': 0.4586162567138672,
'NeuralNetFastAI_BAG_L2': 0.22667431831359863,
'XGBoost_BAG_L2': 0.0695643424987793,
'NeuralNetTorch_BAG_L2': 0.23656845092773438,
'WeightedEnsemble_L3': 0.0005009174346923828},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'KNeighborsUnif_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'KNeighborsDist_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'LightGBMXT_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBMXT_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'RandomForestMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'CatBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'ExtraTreesMSE_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True,
'use_child_oof': True},
'NeuralNetFastAI_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'XGBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetTorch_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -29.313835 2.834204 547.701941
1 NeuralNetFastAI_BAG_L2 -29.587140 2.640517 471.536646
2 CatBoost_BAG_L2 -30.413103 2.461370 454.472600
3 LightGBM_BAG_L2 -30.454012 2.489937 397.634877
4 XGBoost_BAG_L2 -30.566856 2.483407 400.127391
5 LightGBMXT_BAG_L2 -31.271497 2.532743 399.403208
6 ExtraTreesMSE_BAG_L2 -31.364232 2.872459 393.998612
7 RandomForestMSE_BAG_L2 -31.531317 2.918926 398.478997
8 WeightedEnsemble_L2 -32.016161 1.612180 380.789136
9 NeuralNetTorch_BAG_L2 -32.712166 2.650411 416.275231
10 CatBoost_BAG_L1 -33.352401 0.071735 332.139227
11 LightGBM_BAG_L1 -33.917339 0.256798 13.708674
12 LightGBMXT_BAG_L1 -34.345997 0.722591 31.919058
13 ExtraTreesMSE_BAG_L1 -38.202438 0.424710 1.506421
14 RandomForestMSE_BAG_L1 -38.357786 0.450909 2.625593
15 KNeighborsDist_BAG_L1 -84.146423 0.109289 0.019083
16 KNeighborsUnif_BAG_L1 -101.588176 0.108525 0.019136
17 NeuralNetFastAI_BAG_L1 -119.598987 0.269285 10.204870
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000501 0.356613 3 True
1 0.226674 79.394585 2 True
2 0.047527 62.330538 2 True
3 0.076094 5.492815 2 True
4 0.069564 7.985329 2 True
5 0.118901 7.261146 2 True
6 0.458616 1.856550 2 True
7 0.505084 6.336935 2 True
8 0.000858 0.377501 2 True
9 0.236568 24.133169 2 True
10 0.071735 332.139227 1 True
11 0.256798 13.708674 1 True
12 0.722591 31.919058 1 True
13 0.424710 1.506421 1 True
14 0.450909 2.625593 1 True
15 0.109289 0.019083 1 True
16 0.108525 0.019136 1 True
17 0.269285 10.204870 1 True
fit_order
0 18
1 15
2 13
3 11
4 16
5 10
6 14
7 12
8 9
9 17
10 6
11 4
12 3
13 7
14 5
15 2
16 1
17 8 }
# Remember to set all negative values to zero
predictions_new_features = predictor_new_features.predict(test)
# predictions_new_features = predictions_new_features.where(predictions_new_features.values <0).dropna()
predictions_new_features = predictions_new_features.apply(lambda x: np.array(x).clip(min=0))
predictions_new_features
0 14.919042
1 8.825162
2 8.423960
3 8.735602
4 8.823152
...
6488 269.695770
6489 211.033798
6490 158.170288
6491 114.352295
6492 82.353851
Name: count, Length: 6493, dtype: float64
submission_new_features = pd.read_csv('bike-sharing-demand/sampleSubmission.csv')
# Same submitting predictions
submission_new_features["count"] = predictions_new_features
submission_new_features.to_csv("submission_new_features.csv", index=False)
# I work on a work laptop and have no permissions to save json to root directory so all submissions were
# done on the website and not using the code below
# !kaggle competitions submit -c bike-sharing-demand -f submission_new_features.csv -m "new features"
# !kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
0.74591¶hyperparameter and hyperparameter_tune_kwargs arguments.hyperparameters = {'NN': {'num_epochs': 5}, 'GBM': {'num_boost_round': 30}, 'XGB':{'max_depth':3}}
predictor_new_hpo = TabularPredictor(label='count').fit(train_data=train,
time_limit=600, presets="best_quality",
hyperparameters=hyperparameters)
No path specified. Models will be saved in: "AutogluonModels/ag-20220401_161520/"
Presets specified: ['best_quality']
Beginning AutoGluon training ... Time limit = 600s
AutoGluon will save models to "AutogluonModels/ag-20220401_161520/"
AutoGluon Version: 0.4.0
Python Version: 3.7.11
Operating System: Darwin
Train Data Rows: 10886
Train Data Columns: 13
Label Column: count
Preprocessing data ...
AutoGluon infers your prediction problem is: 'regression' (because dtype of label-column == int and many unique label-values observed).
Label info (max, min, mean, stddev): (977, 1, 191.57413, 181.14445)
If 'regression' is not the correct problem_type, please manually specify the problem_type parameter during predictor init (You may specify problem_type as one of: ['binary', 'multiclass', 'regression'])
Using Feature Generators to preprocess the data ...
Fitting AutoMLPipelineFeatureGenerator...
Available Memory: 6221.07 MB
Train Data (Original) Memory Usage: 0.98 MB (0.0% of available memory)
Inferring data type of each feature based on column values. Set feature_metadata_in to manually specify special dtypes of the features.
Stage 1 Generators:
Fitting AsTypeFeatureGenerator...
Note: Converting 3 features to boolean dtype as they only contain 2 unique values.
Stage 2 Generators:
Fitting FillNaFeatureGenerator...
Stage 3 Generators:
Fitting IdentityFeatureGenerator...
Fitting CategoryFeatureGenerator...
Fitting CategoryMemoryMinimizeFeatureGenerator...
Fitting DatetimeFeatureGenerator...
Stage 4 Generators:
Fitting DropUniqueFeatureGenerator...
Types of features in original data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('datetime', []) : 1 | ['datetime']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 7 | ['holiday', 'workingday', 'humidity', 'year', 'month', ...]
Types of features in processed data (raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
0.1s = Fit runtime
13 features in original data used to generate 17 features in processed data.
Train Data (Processed) Memory Usage: 1.1 MB (0.0% of available memory)
Data preprocessing and feature engineering runtime = 0.13s ...
AutoGluon will gauge predictive performance using evaluation metric: 'root_mean_squared_error'
To change this, specify the eval_metric parameter of Predictor()
AutoGluon will fit 2 stack levels (L1 to L2) ...
WARNING: "NN" model has been deprecated in v0.4.0 and renamed to "NN_MXNET". Starting in v0.5.0, specifying "NN" or "NN_MXNET" will raise an exception. Consider instead specifying "NN_TORCH".
Fitting 3 L1 models ...
Fitting model: LightGBM_BAG_L1 ... Training model for up to 399.81s of the 599.86s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-72.7019 = Validation score (root_mean_squared_error)
1.74s = Training runtime
0.04s = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 397.99s of the 598.04s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-38.192 = Validation score (root_mean_squared_error)
55.94s = Training runtime
0.27s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 341.33s of the 541.38s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-182.4318 = Validation score (root_mean_squared_error)
26.98s = Training runtime
1.81s = Validation runtime
Repeating k-fold bagging: 2/20
Fitting model: LightGBM_BAG_L1 ... Training model for up to 312.47s of the 512.52s of remaining time.
Fitting 8 child models (S2F1 - S2F8) | Fitting with SequentialLocalFoldFittingStrategy
-72.5139 = Validation score (root_mean_squared_error)
3.49s = Training runtime
0.08s = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 310.64s of the 510.69s of remaining time.
Fitting 8 child models (S2F1 - S2F8) | Fitting with SequentialLocalFoldFittingStrategy
-37.2239 = Validation score (root_mean_squared_error)
111.94s = Training runtime
0.53s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 253.92s of the 453.97s of remaining time.
Fitting 8 child models (S2F1 - S2F8) | Fitting with SequentialLocalFoldFittingStrategy
-177.0235 = Validation score (root_mean_squared_error)
54.56s = Training runtime
3.59s = Validation runtime
Repeating k-fold bagging: 3/20
Fitting model: LightGBM_BAG_L1 ... Training model for up to 224.5s of the 424.56s of remaining time.
Fitting 8 child models (S3F1 - S3F8) | Fitting with SequentialLocalFoldFittingStrategy
-72.4113 = Validation score (root_mean_squared_error)
5.19s = Training runtime
0.12s = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 222.73s of the 422.78s of remaining time.
Fitting 8 child models (S3F1 - S3F8) | Fitting with SequentialLocalFoldFittingStrategy
-36.8612 = Validation score (root_mean_squared_error)
173.52s = Training runtime
0.84s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 160.27s of the 360.33s of remaining time.
Fitting 8 child models (S3F1 - S3F8) | Fitting with SequentialLocalFoldFittingStrategy
-175.4507 = Validation score (root_mean_squared_error)
81.59s = Training runtime
5.38s = Validation runtime
Repeating k-fold bagging: 4/20
Fitting model: LightGBM_BAG_L1 ... Training model for up to 131.39s of the 331.45s of remaining time.
Fitting 8 child models (S4F1 - S4F8) | Fitting with SequentialLocalFoldFittingStrategy
-72.4406 = Validation score (root_mean_squared_error)
6.88s = Training runtime
0.16s = Validation runtime
Fitting model: XGBoost_BAG_L1 ... Training model for up to 129.63s of the 329.68s of remaining time.
Fitting 8 child models (S4F1 - S4F8) | Fitting with SequentialLocalFoldFittingStrategy
-36.7016 = Validation score (root_mean_squared_error)
233.87s = Training runtime
1.14s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L1 ... Training model for up to 68.44s of the 268.49s of remaining time.
Fitting 8 child models (S4F1 - S4F8) | Fitting with SequentialLocalFoldFittingStrategy
-174.4798 = Validation score (root_mean_squared_error)
108.97s = Training runtime
7.17s = Validation runtime
Completed 4/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L2 ... Training model for up to 360.0s of the 239.24s of remaining time.
-36.4579 = Validation score (root_mean_squared_error)
0.15s = Training runtime
0.0s = Validation runtime
WARNING: "NN" model has been deprecated in v0.4.0 and renamed to "NN_MXNET". Starting in v0.5.0, specifying "NN" or "NN_MXNET" will raise an exception. Consider instead specifying "NN_TORCH".
Fitting 3 L2 models ...
Fitting model: LightGBM_BAG_L2 ... Training model for up to 239.08s of the 239.07s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-52.0497 = Validation score (root_mean_squared_error)
1.74s = Training runtime
0.04s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 237.27s of the 237.26s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.9996 = Validation score (root_mean_squared_error)
3.36s = Training runtime
0.07s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L2 ... Training model for up to 233.78s of the 233.77s of remaining time.
Fitting 8 child models (S1F1 - S1F8) | Fitting with SequentialLocalFoldFittingStrategy
-95.8614 = Validation score (root_mean_squared_error)
26.51s = Training runtime
1.81s = Validation runtime
Repeating k-fold bagging: 2/20
Fitting model: LightGBM_BAG_L2 ... Training model for up to 205.39s of the 205.38s of remaining time.
Fitting 8 child models (S2F1 - S2F8) | Fitting with SequentialLocalFoldFittingStrategy
-52.05 = Validation score (root_mean_squared_error)
3.51s = Training runtime
0.08s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 203.54s of the 203.53s of remaining time.
Fitting 8 child models (S2F1 - S2F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.9874 = Validation score (root_mean_squared_error)
6.65s = Training runtime
0.14s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L2 ... Training model for up to 200.13s of the 200.12s of remaining time.
Fitting 8 child models (S2F1 - S2F8) | Fitting with SequentialLocalFoldFittingStrategy
-92.0949 = Validation score (root_mean_squared_error)
53.54s = Training runtime
3.64s = Validation runtime
Repeating k-fold bagging: 3/20
Fitting model: LightGBM_BAG_L2 ... Training model for up to 171.21s of the 171.2s of remaining time.
Fitting 8 child models (S3F1 - S3F8) | Fitting with SequentialLocalFoldFittingStrategy
-51.9999 = Validation score (root_mean_squared_error)
5.34s = Training runtime
0.12s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 169.29s of the 169.28s of remaining time.
Fitting 8 child models (S3F1 - S3F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.9086 = Validation score (root_mean_squared_error)
9.96s = Training runtime
0.21s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L2 ... Training model for up to 165.86s of the 165.85s of remaining time.
Fitting 8 child models (S3F1 - S3F8) | Fitting with SequentialLocalFoldFittingStrategy
-91.991 = Validation score (root_mean_squared_error)
80.78s = Training runtime
5.47s = Validation runtime
Repeating k-fold bagging: 4/20
Fitting model: LightGBM_BAG_L2 ... Training model for up to 136.73s of the 136.72s of remaining time.
Fitting 8 child models (S4F1 - S4F8) | Fitting with SequentialLocalFoldFittingStrategy
-52.0137 = Validation score (root_mean_squared_error)
7.1s = Training runtime
0.16s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 134.89s of the 134.88s of remaining time.
Fitting 8 child models (S4F1 - S4F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.8395 = Validation score (root_mean_squared_error)
13.27s = Training runtime
0.28s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L2 ... Training model for up to 131.44s of the 131.43s of remaining time.
Fitting 8 child models (S4F1 - S4F8) | Fitting with SequentialLocalFoldFittingStrategy
-91.9238 = Validation score (root_mean_squared_error)
108.48s = Training runtime
7.3s = Validation runtime
Repeating k-fold bagging: 5/20
Fitting model: LightGBM_BAG_L2 ... Training model for up to 101.84s of the 101.84s of remaining time.
Fitting 8 child models (S5F1 - S5F8) | Fitting with SequentialLocalFoldFittingStrategy
-52.0103 = Validation score (root_mean_squared_error)
8.85s = Training runtime
0.2s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 100.02s of the 100.01s of remaining time.
Fitting 8 child models (S5F1 - S5F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.8078 = Validation score (root_mean_squared_error)
16.53s = Training runtime
0.35s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L2 ... Training model for up to 96.64s of the 96.64s of remaining time.
Fitting 8 child models (S5F1 - S5F8) | Fitting with SequentialLocalFoldFittingStrategy
-91.7348 = Validation score (root_mean_squared_error)
135.62s = Training runtime
9.11s = Validation runtime
Repeating k-fold bagging: 6/20
Fitting model: LightGBM_BAG_L2 ... Training model for up to 67.64s of the 67.63s of remaining time.
Fitting 8 child models (S6F1 - S6F8) | Fitting with SequentialLocalFoldFittingStrategy
-51.9992 = Validation score (root_mean_squared_error)
10.58s = Training runtime
0.23s = Validation runtime
Fitting model: XGBoost_BAG_L2 ... Training model for up to 65.82s of the 65.81s of remaining time.
Fitting 8 child models (S6F1 - S6F8) | Fitting with SequentialLocalFoldFittingStrategy
-34.7687 = Validation score (root_mean_squared_error)
20.4s = Training runtime
0.42s = Validation runtime
Fitting model: NeuralNetMXNet_BAG_L2 ... Training model for up to 61.81s of the 61.81s of remaining time.
Fitting 8 child models (S6F1 - S6F8) | Fitting with SequentialLocalFoldFittingStrategy
-91.3669 = Validation score (root_mean_squared_error)
162.63s = Training runtime
10.95s = Validation runtime
Completed 6/20 k-fold bagging repeats ...
Fitting model: WeightedEnsemble_L3 ... Training model for up to 360.0s of the 32.88s of remaining time.
-34.7674 = Validation score (root_mean_squared_error)
0.16s = Training runtime
0.0s = Validation runtime
AutoGluon training complete, total runtime = 567.3s ... Best model: "WeightedEnsemble_L3"
TabularPredictor saved. To load, use: predictor = TabularPredictor.load("AutogluonModels/ag-20220401_161520/")
predictor_new_hpo.fit_summary()
*** Summary of fit() ***
Estimated performance of each model:
model score_val pred_time_val fit_time pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L3 -34.767351 19.843176 532.907516 0.000487 0.155106 3 True 8
1 XGBoost_BAG_L2 -34.768740 8.887749 370.126898 0.422407 20.401735 2 True 6
2 WeightedEnsemble_L2 -36.457861 1.293670 240.904667 0.000435 0.152501 2 True 4
3 XGBoost_BAG_L1 -36.701641 1.137856 233.867603 1.137856 233.867603 1 True 2
4 LightGBM_BAG_L2 -51.999226 8.698473 360.308793 0.233132 10.583630 2 True 5
5 LightGBM_BAG_L1 -72.440563 0.155379 6.884563 0.155379 6.884563 1 True 1
6 NeuralNetMXNet_BAG_L2 -91.366926 19.420281 512.350674 10.954940 162.625511 2 True 7
7 NeuralNetMXNet_BAG_L1 -174.479839 7.172106 108.972997 7.172106 108.972997 1 True 3
Number of models trained: 8
Types of models trained:
{'StackerEnsembleModel_LGB', 'StackerEnsembleModel_TabularNeuralNetMxnet', 'WeightedEnsembleModel', 'StackerEnsembleModel_XGBoost'}
Bagging used: True (with 8 folds)
Multi-layer stack-ensembling used: True (with 3 levels)
Feature Metadata (Processed):
(raw dtype, special dtypes):
('category', []) : 2 | ['season', 'weather']
('float', []) : 3 | ['temp', 'atemp', 'windspeed']
('int', []) : 4 | ['humidity', 'month', 'day', 'hour']
('int', ['bool']) : 3 | ['holiday', 'workingday', 'year']
('int', ['datetime_as_int']) : 5 | ['datetime', 'datetime.year', 'datetime.month', 'datetime.day', 'datetime.dayofweek']
Plot summary of models saved to file: AutogluonModels/ag-20220401_161520/SummaryOfModels.html
*** End of fit() summary ***
{'model_types': {'LightGBM_BAG_L1': 'StackerEnsembleModel_LGB',
'XGBoost_BAG_L1': 'StackerEnsembleModel_XGBoost',
'NeuralNetMXNet_BAG_L1': 'StackerEnsembleModel_TabularNeuralNetMxnet',
'WeightedEnsemble_L2': 'WeightedEnsembleModel',
'LightGBM_BAG_L2': 'StackerEnsembleModel_LGB',
'XGBoost_BAG_L2': 'StackerEnsembleModel_XGBoost',
'NeuralNetMXNet_BAG_L2': 'StackerEnsembleModel_TabularNeuralNetMxnet',
'WeightedEnsemble_L3': 'WeightedEnsembleModel'},
'model_performance': {'LightGBM_BAG_L1': -72.44056300875371,
'XGBoost_BAG_L1': -36.70164113747882,
'NeuralNetMXNet_BAG_L1': -174.4798391007569,
'WeightedEnsemble_L2': -36.4578613497465,
'LightGBM_BAG_L2': -51.99922562039634,
'XGBoost_BAG_L2': -34.76873992028658,
'NeuralNetMXNet_BAG_L2': -91.36692644159668,
'WeightedEnsemble_L3': -34.76735112159422},
'model_best': 'WeightedEnsemble_L3',
'model_paths': {'LightGBM_BAG_L1': 'AutogluonModels/ag-20220401_161520/models/LightGBM_BAG_L1/',
'XGBoost_BAG_L1': 'AutogluonModels/ag-20220401_161520/models/XGBoost_BAG_L1/',
'NeuralNetMXNet_BAG_L1': 'AutogluonModels/ag-20220401_161520/models/NeuralNetMXNet_BAG_L1/',
'WeightedEnsemble_L2': 'AutogluonModels/ag-20220401_161520/models/WeightedEnsemble_L2/',
'LightGBM_BAG_L2': 'AutogluonModels/ag-20220401_161520/models/LightGBM_BAG_L2/',
'XGBoost_BAG_L2': 'AutogluonModels/ag-20220401_161520/models/XGBoost_BAG_L2/',
'NeuralNetMXNet_BAG_L2': 'AutogluonModels/ag-20220401_161520/models/NeuralNetMXNet_BAG_L2/',
'WeightedEnsemble_L3': 'AutogluonModels/ag-20220401_161520/models/WeightedEnsemble_L3/'},
'model_fit_times': {'LightGBM_BAG_L1': 6.884562730789185,
'XGBoost_BAG_L1': 233.86760330200195,
'NeuralNetMXNet_BAG_L1': 108.97299695014954,
'WeightedEnsemble_L2': 0.15250110626220703,
'LightGBM_BAG_L2': 10.583630084991455,
'XGBoost_BAG_L2': 20.401735067367554,
'NeuralNetMXNet_BAG_L2': 162.62551140785217,
'WeightedEnsemble_L3': 0.1551060676574707},
'model_pred_times': {'LightGBM_BAG_L1': 0.1553792953491211,
'XGBoost_BAG_L1': 1.1378560066223145,
'NeuralNetMXNet_BAG_L1': 7.172106027603149,
'WeightedEnsemble_L2': 0.00043511390686035156,
'LightGBM_BAG_L2': 0.23313164710998535,
'XGBoost_BAG_L2': 0.4224073886871338,
'NeuralNetMXNet_BAG_L2': 10.9549400806427,
'WeightedEnsemble_L3': 0.0004868507385253906},
'num_bag_folds': 8,
'max_stack_level': 3,
'model_hyperparams': {'LightGBM_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'XGBoost_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetMXNet_BAG_L1': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L2': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'LightGBM_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'XGBoost_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'NeuralNetMXNet_BAG_L2': {'use_orig_features': True,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True},
'WeightedEnsemble_L3': {'use_orig_features': False,
'max_base_models': 25,
'max_base_models_per_type': 5,
'save_bag_folds': True}},
'leaderboard': model score_val pred_time_val fit_time \
0 WeightedEnsemble_L3 -34.767351 19.843176 532.907516
1 XGBoost_BAG_L2 -34.768740 8.887749 370.126898
2 WeightedEnsemble_L2 -36.457861 1.293670 240.904667
3 XGBoost_BAG_L1 -36.701641 1.137856 233.867603
4 LightGBM_BAG_L2 -51.999226 8.698473 360.308793
5 LightGBM_BAG_L1 -72.440563 0.155379 6.884563
6 NeuralNetMXNet_BAG_L2 -91.366926 19.420281 512.350674
7 NeuralNetMXNet_BAG_L1 -174.479839 7.172106 108.972997
pred_time_val_marginal fit_time_marginal stack_level can_infer \
0 0.000487 0.155106 3 True
1 0.422407 20.401735 2 True
2 0.000435 0.152501 2 True
3 1.137856 233.867603 1 True
4 0.233132 10.583630 2 True
5 0.155379 6.884563 1 True
6 10.954940 162.625511 2 True
7 7.172106 108.972997 1 True
fit_order
0 8
1 6
2 4
3 2
4 5
5 1
6 7
7 3 }
# Remember to set all negative values to zero
predictions_new_hpo = predictor_new_hpo.predict(test)
# predictions_new_features = predictions_new_features.where(predictions_new_features.values <0).dropna()
predictions_new_hpo = predictions_new_hpo.apply(lambda x: np.array(x).clip(min=0))
submission_new_hpo = pd.read_csv('bike-sharing-demand/sampleSubmission.csv')
# Same submitting predictions
submission_new_hpo["count"] = predictions_new_hpo
submission_new_hpo.to_csv("submission_new_hpo.csv", index=False)
!kaggle competitions submit -c bike-sharing-demand -f submission_new_hpo.csv -m "new features with hyperparameters"
!kaggle competitions submissions -c bike-sharing-demand | tail -n +1 | head -n 6
0.52941¶# Taking the top model score from each training run and creating a line plot to show improvement
# You can create these in the notebook and save them to PNG or use some other tool (e.g. google sheets, excel)
fig = pd.DataFrame(
{
"model": ["initial", "add_features", "hpo"],
"score": [-50.082320 , -29.313835 , -34.807948]
}
).plot(x="model", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_train_score.png')
# Take the 3 kaggle scores and creating a line plot to show improvement
fig = pd.DataFrame(
{
"test_eval": ["initial", "add_features", "hpo"],
"score": [1.85, 0.75, 0.53]
}
).plot(x="test_eval", y="score", figsize=(8, 6)).get_figure()
fig.savefig('model_test_score.png')
hyperparameters = {'NN': {'num_epochs': 5}, 'GBM': {'num_boost_round': 30}, 'XGB':{'max_depth':3}}
# The 3 hyperparameters we tuned with the kaggle score as the result
pd.DataFrame({
"model": ["initial_model", "add_features_model", "hpo_model"],
"hpo1": [np.nan, np.nan, "NN: {num_epochs: 5}"],
"hpo2": [np.nan, np.nan, "GBM: {num_boost_round: 30}"],
"hpo3": [np.nan, np.nan, "XGB':{max_depth:3}"],
"score": [1.85, 0.75, 0.53]
})
| model | hpo1 | hpo2 | hpo3 | score | |
|---|---|---|---|---|---|
| 0 | initial_model | NaN | NaN | NaN | 1.85 |
| 1 | add_features_model | NaN | NaN | NaN | 0.75 |
| 2 | hpo_model | NN: {num_epochs: 5} | GBM: {num_boost_round: 30} | XGB':{max_depth:3} | 0.53 |
!pip install seaborn
Collecting seaborn Using cached seaborn-0.11.2-py3-none-any.whl (292 kB) Requirement already satisfied: scipy>=1.0 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from seaborn) (1.7.3) Requirement already satisfied: numpy>=1.15 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from seaborn) (1.21.5) Requirement already satisfied: pandas>=0.23 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from seaborn) (1.3.5) Requirement already satisfied: matplotlib>=2.2 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from seaborn) (3.5.1) Requirement already satisfied: cycler>=0.10 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.11.0) Requirement already satisfied: pillow>=6.2.0 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (9.0.1) Requirement already satisfied: kiwisolver>=1.0.1 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.4.0) Requirement already satisfied: pyparsing>=2.2.1 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (3.0.7) Requirement already satisfied: python-dateutil>=2.7 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.8.2) Requirement already satisfied: packaging>=20.0 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (21.3) Requirement already satisfied: fonttools>=4.22.0 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (4.31.2) Requirement already satisfied: pytz>=2017.3 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2022.1) Requirement already satisfied: typing-extensions in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.2->seaborn) (3.10.0.2) Requirement already satisfied: six>=1.5 in /Users/saho/opt/anaconda3/envs/bike_share/lib/python3.7/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn) (1.16.0) Installing collected packages: seaborn Successfully installed seaborn-0.11.2
import seaborn as sns
import plotly.express as px
import plotly.io as pio
pio.renderers.default = "plotly_mimetype+notebook"
sns.heatmap(train_features.corr())
<AxesSubplot:>
We can see two peaks in typical usage - around 8am and then even stronger at at around 5pm / 6pm (rush hours)
fig = px.bar(train.groupby(['hour'])['count'].mean(),title='Average number of bike shares by hour of day')
fig.show()
Between january and june, demand increases month on month and peaks for the summer months and tails off from October
df_summary = pd.concat([train.groupby('month')['temp'].mean(),\
train.groupby(['month'])['count'].mean()],axis=1)
df_summary.columns = ['average_temp','total_shares']
Bike shares peak in the summer months when it is the warmest time of the year
fig_bar = px.bar(df_summary,color='average_temp',title='Average number of bike shares by month, hue denotes average monthly temperature')
fig_bar.show()